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First I would like to congratulate the three authors for a very nice pa¬ 
per. During a visit to Eindhoven in 2010, Botond Szabo and Harry van 
Zanten mentioned the first steps of this work to me, which concerned the 
understanding of certain empirical Bayes procedures in the white noise Up¬ 
setting. Since then, together with Aad van der Vaart, they have broadened 
their original goals and have produced an impressive and very interesting 
series of papers on the subject. The present paper is indeed one aspect of 
a larger body of work, and we will mention a few connections with these 
related papers below. 

The authors start from the signal in white noise model, that after pro¬ 
jection in onto an appropriate basis, typically related to the SVD of the 
operator K of the inverse problem, is translated into a sequence formulation. 
They choose a prior distribution that makes coordinates independent: 

(1) n„~(g)iv(o,r^-p“). 

i>l 

If the true parameter belongs to a regularity space dehned from a decay 
of coefficients in the previous basis, the authors prove that certain credible 
sets constructed from the posterior distribution coupled with a (marginal- 
likelihood) empirical Bayes (EB) procedure for a achieve excellent perfor¬ 
mance: they are honest confidence sets with adaptive, optimal asymptotic 
diameter if one restricts to certain classes of “self-similar”-type true param¬ 
eters. These are the first results of this type in Bayesian nonparametrics. 

We organize this discussion around two main themes: 

1. Priors for Bayesian credible sets. 

2. Bayesian credible regions and simulations. 
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1. Priors for Bayesian credible sets. Several aspects of the prior scheme 

(1) are investigated by the authors in [10], [9] together with Bartek Knapik 
and in [14], In [10], a fixed regularity parameter a is considered; in [9], 
adaptative contraction rates are derived. In [14], the prior (1) is used for 
fixed a and the use of a different empirical Bayes scheme is advocated. 

Related priors. Staying with priors defined on the SVD of RT, some 
other adaptation schemes have been considered recently. One is (see [13]) 

(2) n,~(g)A^(0,r2r^-2“), r>0, 

i>l 

and adaptation is made by empirical Bayes or full Bayes on r. 

Another prior is obtained by setting, for a sequence {Aj}i>i of positive 
nondecreasing real numbers, 

(3) nt~(g)iV(0,e-^^'), i>0. 

i>\ 

In the case where K is the identity and, for example, this falls into 

the framework considered in [2] , where a full Bayes method is considered by 
putting a well-chosen hyper prior on t. 

A natural question is whether the same construction as in the paper with a 
slightly blown up L^-ball and regularity estimated by empirical or full Bayes 
would work the same for the priors (2) or (3), with self-similarity constraints 
expressed in a similar way. One can conjecture that the answer is yes and 
that one may study the empirical Bayes procedure from the explicit form of 
the marginal likelihood. 

Related priors and sharp rates . Rates of convergence for Bayes pro¬ 
cedures are sometimes shown to be optimal up to a slowly varying factor in 
n, for instance, logarithmic. In some cases it is not so clear whether such 
a logarithmic term should be present in the rate or not. The present work 
points to interesting questions with this respect, with connections to the 
related prior schemes (2)-(3). 

For prior (2), it is shown in [13] that the minimax rate in 

over hyperrectangles is achieved by the marginal-likelihood-empirical Bayes 
procedure. This comes, however, to a cost: one should assume that the true 
regularity /3 of the signal satisfies /3<l/2-|-a, fora the regularity parameter 
in (2), otherwise the (uniform) rate can be shown to be suboptimal. 

For prior (3), we obtained in [2] the rate (logn/n)^/^^"*"^^^ in over a class 
containing hyperrectangles and for which the minimax rate is 
so without the log-term, thus showing the unavoidable loss of a logarithmic 
factor when using prior (3). 

In [9], the authors obtain an upper-bound rate for prior (1) in that 
contains a logarithmic factor. However, Proposition 3.8 of the present paper 
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shows that the radius of the credible set is proportional to while 

Theorem 3.6 implies coverage of the credible set for polished tail parameters. 
Combining these results, one deduces that the posterior mean verifies 
~ ^olb = This presumably implies that the posterior 

itself converges at the minimax rate, without extra log-terms, if the true 
Oq has polished tails. One may conjecture that this is also true without the 
polished tail assumption. If so, it would be interesting to better understand 
what makes that priors (l)-(3) behave differently. 

Different priors and conditions. The prior scheme (1) is, by defi¬ 
nition, somewhat tied to the SVD of K. As this type of basis may not be 
well-localized, this may cause some difficulties if the goal is a result in terms 
of a different loss function than L^. 

Also, smoothness classes for /o are defined in terms of this basis and thus 
connected to K. This may not always correspond to natural assumptions of 
the practical problem at hand; see, for instance, [6]. The same can probably 
be said about the polished tail or self-similarity conditions. As they stand, 
they refer to coefficients in the basis associated to K, which may not always 
be canonical. 

For these reasons, it would probably be interesting for future works to 
consider different types of priors. It is unclear whether in general a direct 
analysis of the explicit expression of the likelihood (and marginal likelihood 
for the EB approach) will be possible. It would certainly be desirable, if 
possible, to develop some general understanding of empirical Bayes meth¬ 
ods. On the other hand, it would also be interesting to develop indirect (or 
qualitative) techniques, similar to those of the meta-theorem of [7] for these 
problems. Although this may not be easy for inverse problems, some recent 
work for these include [11] and [8]. Other recent results on functionals using 
arguments allowing implicit expressions can be found in [5] and [Ij. 

Different approaches to nonparametric credible sets. As the 
authors mention at the end of their introduction, for parametric models 
the Bernstein-von Mises (BvM) theorem is a canonical tool to justify that 
Bayesian credible sets are frequentist confidence sets. In [3] and [4], R. Nickl 
and myself proposed a possible approach for the nonparametric BvM and 
showed that it could be applied to the construction of fixed-regularity non¬ 
parametric confidence sets. I am not sure I understand the authors’ sentence 
“no method that avoids dealing with the bias-variance trade-off will prop¬ 
erly quantify the uncertainty... current practice.” In [3] and [4], no adap¬ 
tation claims were made, and the confidence sets there are for fixed regu¬ 
larity, although the proposed methodology to build such sets does not per 
se exclude adaptive priors. Recently, a first application of this programme 
with adaptive priors in white noise was carried out in [12], leading to 
and L°° adaptive confidence sets computable in practice, under appropriate 
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self-similarity conditions. The “bias-variance” trade-off mentioned by the 
authors I guess typically appears when estimating the “regularity” of the 
signal, for instance, by an empirical Bayes technique. 

Bias-variance trade-off and choice of the prior. There are sev¬ 
eral interesting questions mentioned by the authors beyond the L^-results 
of the paper. One is obtaining Bayesian conhdence sets for other norms, 
related to the problem of estimating certain functionals, such as the value 
of the function at a point; see the discussion on these in [9] for the prior 
(1). Another question is building different types of adaptive L^-confidence 
sets, where the regularity is assumed to belong to an interval [a,2Q;], as 
considered in [14], again with the scheme (1). 

In both cases the authors seem to conclude that marginal likelihood em¬ 
pirical Bayes or full Bayes methods have some trouble, related to the choice 
of the regularity parameter: for instance, the marginal-likelihood EB method 
does not seem to perform the correct bias-variance trade-off in the two prob¬ 
lems. The proposed solution is then to choose the tuning parameter din in¬ 
dependently, by a possibly non-Bayes method. We agree, but one may note 
that all these results are for the given prior scheme (1). Is it not conceivable 
that, for a given problem (e.g., adaptive estimation of a functional), there 
exists a prior for which the two steps are performed optimally? Perhaps this 
is too much to ask in general, but, after all, this is the remarkable result 
that the authors show in the present paper: at least for the present problem, 
the Bayes method performs well in (1) rate-adaptation and (2) providing an 
(EB-)estimate oin so that the confidence set has the desired coverage. 

2. Bayesian credible sets and simulations. The authors present interest¬ 
ing simulations and a representation of the credible sets in the case of the 
Volterra operator. 

What is exactly a plot of a credible set? The credible ball con¬ 
sidered in the paper is, with T = 1, 

(4) Cn = {0 ^ ll^ - en,&„ II 2 < rn,'y{an)}. 

In their Figure 1, the authors plot random draws from the posterior distri¬ 
bution. The idea is that all (but possibly a few) of these draws belong to 
the credible ball. From this definition, we can make two comments: 

1. Curves that are not typical posterior draws belong to Cn- 

2. There is typically much more “information” in the posterior (coming 
from the prior) than the fact of belonging to such an ^^-ball. 

To illustrate the fact that Cn is in some sense larger than the “support” of 
the posterior distribution, we have generated random draws within Cn using 
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n-1000, Draws in ball 


n=1000, Posterior Draws 



Fig. 1. In gray, on each plot, N = 50 sampled curves from the posterior distribution 
(right column) and the law induced by (5) (left column), for n = 10® (top) and n= 10® 
(bottom). Posterior mean and true function are in blue and black, respectively. 

a distribution different from the posterior. First, consider the sequence, given 
the data, 

P = {pk)k>l ^ (0n,an + 0,—— JTTTTJ ) ’ 

V (fclog kyt^Jk>i 

for a > 0 some small constant and i.i.d. A^(0,1) variables. Consider the 
law 

(5) C(p\p£Cn), 

the distribution of p conditioned to belong to the set Cn- Curves whose 
coefficients are sampled from this law are represented in the left column of 
Figure 1, where we took a = while the right column corresponds 

to posterior draws. One notices that the typical curves on the left are more 
“wiggly” than those from the posterior distribution and also tend to spread 
more, depending on how much curves N are simulated, here = 50. 

On the other hand, the posterior distribution itself admits a series of 
features that are not necessarily present in a typical element of the L^-ball. 
For instance, if / is a draw from the posterior on the signal function, and is 
din concentrates, which is the case for self-similar-type truths, the supremum 
norm ||/ — fn,an\\oo is a stochastically bounded quantity that only depends 
on the data via oin, as can be seen from equation (6) below. So with high 
probability the posterior draws stay within a tube centered at the posterior 
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mean. If a > 1/2, one could presumably also prove at least some supremum- 
norm consistency of the posterior around /o, following, for example, [5]. 

Given that the mathematical definition of the credible set is (4), it seems 
natural to ask whether one should report draws from posterior or from (5). 
Or rather, would it be possible to define a credible set directly from the 
posterior draws themselves, instead of reporting a full L^-ball, while still 
retaining the desired coverage properties? 

Improving on the estimate of the radius. The authors simulate 
N = 2000 draws from the empirical Bayes posterior and retain the 1 — 7 = 
95% closest to the posterior mean. This means that an implicit “built-in” es¬ 
timator of the radius of the credible set is used. More precisely, if i?i,..., Rn 
denote the observed L^-radii of N draws under the posterior n 5 ,^[-|X], only 
the curves with radius, respectively, i?(i) < • • • < ??([o. 95 .Arj) ai'e retained. In 
other words, ??([o. 95 .Arj) is used as an estimator of rn^'yian)- 

This methodology is simple and certainly reasonable for relatively large 
N, the precision of the “built-in” quantile estimator being of order 
In case one likes to be precise about the (1 — 7 )-coverage or, in cases where 
the posterior can only be approximated, if one wants to detect possible 
outliers, one may suggest an improvement based on a separate estimation of 
^n, 7 (dn). First, one may note that, in general, the posterior distribution of 
the radius could be more easily accessible (or sampling from it could require 
less computing time) than the full posterior. In the considered white noise 
model example, computing a precise approximation of r„, 7 (cin) is simple, as 
the posterior distribution re-centered at the posterior mean has distribution, 
if Tn is the map 9^6 — 6n,an > 



It is then straightforward to simulate the random variable HClb) where C is 
a draw from the distribution in the last display, and then estimate rn,-y(an) 
based, for example, on a quantile as before, but this time using a much 
larger sample size (not necessarily N = 2000 as before). This can be made 
before running the program simulating the posterior draws of the function 
/. For instance, in the Volterra example with n = 1000, one obtains the 
estimate ^^, 7 ( 1 ) := 0.42 ss ?’n, 7 (l) using a sample of size 10^ [we set a = 1 for 
simplicity, but an approximation of r„^-y(d„) is obtained similarly, as soon 
as ctn has been computed]. 

We have run a few iterations of the algorithm proposed by the authors, 
with the previous slight modification and setting a = 1 for simplicity. As 
the estimate of the radius is improved, the rule for discarding draws is more 
precise. For the results in Table 1, we have taken the precise estimate 
as “true.” 
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Table 1 

Experiment using the original algorithm compared to a program with separate precise 
estimation fn,-y(V) (taken as “true”) o/r^, 7 ( 1 ). After 10 repetitions, Nfp is the mean 
number of “incorrectly” retained curves (false positive) by original algorithm and Njn of 
“incorrectly” discarded curves (false negative). In parenthesis percentage of occurrence 


n 

1000 


10 ® 


10 ® 

N 

500 

2000 

500 

2000 

500 

2000 

Nf, 

6 (40%) 

5 (70%) 

4 (50%) 

6 (40%) 

6 (50%) 

4 (20%) 

Nfr, 

3 (50%) 

14 (20%) 

6 (50%) 

12 (50%) 

3 (50%) 

8 (80%) 


As shown in Table 1, a few curves per experiment typically were either 
incorrectly included or excluded. Quantitatively, the number of such curves is 
not very high, but, on the other hand, these are the curves the farthest away 
from the posterior mean, so visually this has (sometimes) some impact on 
the pictures. This observation can be applied as well for pictnres of credible 
bands, as recently considered, for example, in [12]. 

Congratnlations again to the anthors for their inspiring series of works. 
Developing tools to bnild Bayesian credible sets for other models and priors 
is a very interesting topic, and we expect to see more on the subject soon. 
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